The multi-agent framework space exploded in 2025. OpenAI released its Agents SDK in March, Google released ADK in April, Anthropic published its Agent SDK alongside Claude 4.6, and LangGraph and CrewAI matured through multiple production iterations. Here is the current landscape and the underlying concepts that hold across every framework.
1. The Framework Landscape: June 2026
Need stateful, auditable, conditional routing? → LangGraph
Need a working prototype in 2-4 hours? → CrewAI
Already on OpenAI, need guardrails + tracing? → OpenAI Agents SDK
Building primarily on Claude? → Anthropic Agent SDK
Need to run across multiple LLM providers? → LangGraph or CrewAI (provider-agnostic)
Teams that start with CrewAI for prototyping often migrate to LangGraph when they need production-grade state management and conditional routing. Start with CrewAI to validate the workflow, then migrate when you need checkpointing, branching, and LangSmith observability.
2. The 5 Dominant Patterns in 2026
Five patterns dominate production multi-agent systems: supervisor, pipeline, fan-out, maker-checker, and swarm. Most production systems combine two or three of them.
- Supervisor: A manager agent routes work to specialized workers. Widest native framework support. Best-understood failure modes. Start here.
- Pipeline: Agents chained in fixed order for deterministic workloads. Researcher to Architect to Coder to Auditor. Predictable, easy to debug.
- Fan-out: One agent triggers multiple parallel workers, then collects results. Good for parallel research, multi-pass auditing.
- Maker-Checker: An actor agent paired with a verifier agent. The verifier checks every output before it moves forward. Cuts hallucinations significantly on high-stakes tasks.
- Swarm: Peer agents communicate freely with a shared scratchpad. Most flexible, hardest to debug. Use only when the other patterns do not fit.
3. The Blackboard Pattern: Shared State Without Coupling
In a blackboard architecture, agents read and write to a shared data store. No agent calls another agent directly. They communicate entirely through the blackboard. Any agent can fail, retry, or be replaced without the others knowing.
import Database from 'better-sqlite3';
class Blackboard {
constructor(dbPath) {
this.db = new Database(dbPath);
this.db.exec(`
CREATE TABLE IF NOT EXISTS artifacts (
id INTEGER PRIMARY KEY AUTOINCREMENT,
run_id TEXT NOT NULL,
agent TEXT NOT NULL,
key TEXT NOT NULL,
value TEXT NOT NULL,
timestamp INTEGER DEFAULT (unixepoch())
)
`);
}
write(runId, agent, key, value) {
this.db.prepare(
'INSERT INTO artifacts (run_id, agent, key, value) VALUES (?, ?, ?, ?)'
).run(runId, agent, key, JSON.stringify(value));
}
read(runId, agent, key) {
const row = this.db.prepare(
'SELECT value FROM artifacts WHERE run_id=? AND agent=? AND key=? ORDER BY id DESC LIMIT 1'
).get(runId, agent, key);
return row ? JSON.parse(row.value) : null;
}
}
SQLite for single-tenant local engines. Postgres when multiple machines need to access the same data simultaneously, or for multi-tenant architectures.
4. DAG Scheduler: Kahn's Algorithm
A DAG (Directed Acyclic Graph) models agent dependencies. "The Architect must run after the Researcher. The Coder must run after the Architect." Kahn's algorithm computes a valid execution order at runtime — add a new agent, declare its dependencies, and the scheduler handles placement automatically.
from collections import deque
def topological_sort(nodes: list[str], edges: list[tuple[str, str]]) -> list[str]:
in_degree = {n: 0 for n in nodes}
graph = {n: [] for n in nodes}
for src, dst in edges:
graph[src].append(dst)
in_degree[dst] += 1
queue = deque([n for n in nodes if in_degree[n] == 0])
order = []
while queue:
node = queue.popleft()
order.append(node)
for neighbor in graph[node]:
in_degree[neighbor] -= 1
if in_degree[neighbor] == 0:
queue.append(neighbor)
if len(order) != len(nodes):
raise ValueError("Cycle detected in agent dependency graph")
return order
# SDLC pipeline example
nodes = ["researcher", "architect", "coder", "auditor", "documenter", "committer"]
edges = [
("researcher", "architect"), ("architect", "coder"),
("coder", "auditor"), ("auditor", "documenter"),
("documenter", "committer"),
]
order = topological_sort(nodes, edges)
# → ['researcher', 'architect', 'coder', 'auditor', 'documenter', 'committer']
5. Circuit Breaker: Preventing Cascade Failure
A circuit breaker wraps an LLM provider call. It tracks failure rate. When failures exceed a threshold, the breaker opens and stops sending requests to the failing provider. After a timeout, it half-opens and probes with one request. Without circuit breakers, one degraded provider slows your entire pipeline to timeout. With circuit breakers, the failing provider is bypassed immediately.
from enum import Enum
from time import time
class State(Enum):
CLOSED = "closed" # normal, requests flow through
OPEN = "open" # failing, requests blocked
HALF_OPEN = "half_open" # recovery probe
class CircuitBreaker:
def __init__(self, failure_threshold=5, recovery_timeout=60):
self.state = State.CLOSED
self.failure_count = 0
self.last_failure_time = None
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
def call(self, fn, *args, **kwargs):
if self.state == State.OPEN:
if time() - self.last_failure_time > self.recovery_timeout:
self.state = State.HALF_OPEN
else:
raise Exception("Circuit open — provider bypassed")
try:
result = fn(*args, **kwargs)
self.failure_count = 0
self.state = State.CLOSED
return result
except Exception as e:
self.failure_count += 1
self.last_failure_time = time()
if self.failure_count >= self.failure_threshold:
self.state = State.OPEN
raise e
6. Model Tiering for Cost (40-60% Savings)
Using a single premium model across all agents is the most common cost mistake. A common production pattern: use fast, cheap models for triage and routing agents, and capable models only for complex reasoning. This reduces costs 40-60% compared to running a single premium model everywhere.
# Cheap model for routing decisions
router_response = client.messages.create(
model="claude-haiku-4-5-20251001", # fast, cheap
messages=[{"role": "user", "content": f"Which agent should handle: {task}?"}]
)
# Mid-tier for standard agent work
agent_response = client.messages.create(
model="claude-sonnet-4-6", # balanced
messages=[{"role": "user", "content": task_prompt}]
)
# Premium only for deep reasoning (architecture, security audit)
audit_response = client.messages.create(
model="claude-opus-4-8-20260528", # only when depth matters
thinking={"type": "enabled", "effort": "high"},
messages=[{"role": "user", "content": audit_prompt}]
)
7. OpenTelemetry for Observability: Now Table Stakes
In 2026, OpenTelemetry is the standard transport and schema for agent observability. Every major platform — Datadog, New Relic, LangSmith — natively supports GenAI semantic conventions. Instrument against OTel GenAI conventions from the start rather than vendor-proprietary SDKs.
The mental model: every user request is a single trace. Each agent invocation, tool call, retrieval, and handoff is a span. Pass trace context along whenever an agent calls another agent or tool. This lets you reconstruct the full execution path of any run from a single trace ID.
from opentelemetry import trace
from opentelemetry.semconv.ai import SpanAttributes
tracer = trace.get_tracer("my-agent-system")
def run_agent(agent_name: str, prompt: str, model: str):
with tracer.start_as_current_span(f"agent.{agent_name}") as span:
span.set_attribute(SpanAttributes.LLM_SYSTEM, "anthropic")
span.set_attribute(SpanAttributes.LLM_REQUEST_MODEL, model)
span.set_attribute("agent.name", agent_name)
response = client.messages.create(
model=model,
messages=[{"role": "user", "content": prompt}]
)
span.set_attribute(SpanAttributes.LLM_USAGE_PROMPT_TOKENS,
response.usage.input_tokens)
span.set_attribute(SpanAttributes.LLM_USAGE_COMPLETION_TOKENS,
response.usage.output_tokens)
return response
8. SHA-256 Response Caching
Identical LLM calls should never be paid for twice. SHA-256 caching creates a content hash of the full prompt and stores the response. On the next identical call, return the cached response immediately. In a 3-pass audit where the Auditor runs similar prompts on retries, caching the second and third passes means they cost nearly nothing if the code did not change.
import crypto from 'crypto';
class ResponseCache {
constructor(db) {
this.db = db;
this.db.exec(`
CREATE TABLE IF NOT EXISTS llm_cache (
hash TEXT PRIMARY KEY,
response TEXT NOT NULL,
provider TEXT,
created_at INTEGER DEFAULT (unixepoch())
)
`);
}
hashPrompt(messages, model) {
return crypto.createHash('sha256')
.update(JSON.stringify({ messages, model }))
.digest('hex');
}
get(hash) {
const row = this.db.prepare('SELECT response FROM llm_cache WHERE hash=?').get(hash);
return row ? JSON.parse(row.response) : null;
}
set(hash, response, provider) {
this.db.prepare(
'INSERT OR REPLACE INTO llm_cache (hash, response, provider) VALUES (?, ?, ?)'
).run(hash, JSON.stringify(response), provider);
}
}
9. SSE Streaming Telemetry
Server-Sent Events let a server push updates to a browser over a single HTTP connection. In agent pipelines, SSE is the right choice for streaming progress to a UI: which agent is running, token counts, intermediate outputs. Simpler than WebSockets (one-directional), works over HTTP/1.1, auto-reconnects.
app.get('/runs/:runId/stream', (req, res) => {
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
const emit = (event, data) =>
res.write(`event: ${event}\ndata: ${JSON.stringify(data)}\n\n`);
const unsubscribe = runEmitter.on(req.params.runId, (event) => {
emit(event.type, event.payload);
if (event.type === 'run.complete' || event.type === 'run.error') {
res.end();
unsubscribe();
}
});
req.on('close', unsubscribe);
});